Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

ߚ௞

ିଵൌ

∑

݂ሺ݇|ݔ௜ሻሺݔ௜െߤ௞ሻ^ଶ

௜ୀଵ

∑

݂ሺ݇|ݔ௜ሻ

ே

௜ୀଵ

ݓ௞ൌ¹

ܰ^{෍݂ሺ݇|ݔ}^௜^ሻ

ே

௜ୀଵ

(2.11)

above equations, ݂ሺ݇|ݔ௜ሻ is defined as below,

݂ሺ݇|ݔ௜ሻൌݓ௞

࣡൫ݔ௜|ߤ௞, ߪ௞

ଶ൯

݂ሺݔ௜ሻ

(2.12)

major difference between the non-parametric (kernel) approach

emi-parametric approach is the number of kernels or components.

a point is perhaps used as a kernel in a kernel-based density

n model. Sometimes, a subset of data points is employed as the

n a kernel-based density estimation model. All kernels of a kernel-

del normally employ an identical variance, i.e., ߪଵ

ଶൌߪଶ

ଶൌ⋯ൌ

supposing M kernels are employed in a model. In a semi-

ic model, the number of components is much smaller than the

f kernels. Moreover, the components of a semi-parametric model

ferent variances, i.e., ߪଵ

ଶ്ߪଶ

ଶ്⋯്ߪெ

ଶ if M components are

e 2.13 shows a comparison between a kernel-based model and a

ametric model for a data set. A kernel-based model constructed

ata set is shown in Figure 2.13(a), in which a random sample of

points were used as the kernels. A semi-parametric model

ed for the same data set is shown in Figure 2.13(b), in which two

components were used. Although two density estimators result

ar density function, their basic principles are different and their

ional costs are different as well. A kernel-based model is less

when it is constructed, but is costing when it is used for the

on new data. A semi-parametric model requires more time to

, but it is computationally cheap when it is used for the inference

ata.